Dear TCCA membership: We are delighted to announce the online publication of the inaugural set of four papers for TCCA's new publication, Computer Architecture Letters. "Letters" is a quarterly forum for fast publication of new, high-quality ideas in the form of short, critically refereed, technical "letters". Accepted letters are published immediately on our website and in the next available paper issue. Submissions are accepted on a continuing basis. Current turn-around time is 32 days, and we hope to improve this as our review process becomes more efficient. Current acceptance rate is 18.2%. The titles and abstracts of the inaugural set of letters appears below, and these letters as well as the call for papers and submission instructions, can be found on the Letters website at http://www.cs.virginia.edu/~tcca/ We hope that you will look forward to each issue as a nice digest of some of the latest hot research going on in our field, and we hope that you will submit your early and exciting research results to Letters. We hope that the quick turn-around will encourage this by providing immediate recognition. Since IEEE allows publication in its conferences and journals if there is at least 30% new material and this seems to be a fairly common rule of thumb, this should not constrain researchers from following their letter with full conference papers or journal articles. Yale Patt, Editor-in-Chief Kevin Skadron, Associate Editor-in-Chief Jean-Luc Gaudiot, TCCA Chair Papers, volume 1, 2002, available online at http://www.cs.virginia.edu/~tcca/ ------------------------------------------- - B. Towles, W. J. Dally. "Worst-case Traffic for Oblivious Routing Functions." - C. €lvarez, J. Corbal, E. Salam­, M. Valero. "Initial Results on Fuzzy Floating Point Computation for Multimedia Processors." - A. Gordon-Ross, S. Cotterell, F. Vahid. "Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example." - J.-H. Choi, J.-H. Lee, S.-W. Jeong, S.-D. Kim, C. Weems. "A Low Power TLB Structure for Embedded Systems." Abstracts --------- - B. Towles, W. J. Dally. "Worst-case Traffic for Oblivious Routing Functions." This paper presents an algorithm to find a worst-case traffic pattern for any oblivious routing algorithm on an arbitrary interconnection network topology. The linearity of channel loading offered by oblivious routing algorithms enables the problem to be mapped to a bipartite maximum-weight matching, which can be solved in polynomial time for routing functions with a polynomial number of paths. Finding exact worst-case performance was previously intractable, and we demonstrate an example case where traditional characterization techniques overestimate the throughput of a particular routing algorithm by 47%. - C. €lvarez, J. Corbal, E. Salam­, M. Valero. "Initial Results on Fuzzy Floating Point Computation for Multimedia Processors." During the recent years, the market of mid/low-end portable systems such as PDAs or mobile digital phones have experimented a revolution in both selling volume and features as handheld devices incorporate Multimedia applications. This fact brings to an increase in the computational demands of the devices, while still having the limitation of power (and energy) consumption. Instruction memoization is a promising technique to help alleviate the problem of power consumption of expensive functional units such as the floating-point one. Unfortunately, this technique could be energy-inefficient for low-end systems due to the additional power consumption of the relatively big tables required. In this paper we present a novel way of understanding multimedia floating point operations based on the fuzzy computation paradigm: losses in the computation precision may exchange performance for negligible errors in the output. Exploiting the implicit characteristics of media FP computation, we propose a new technique called fuzzy memoization. Fuzzy memoization expands the capabilities of classic memoization by attaching entries with similar inputs to the same output. We present a case of study for a SH4-like processor and report good performance and power-delay improvements with feasible hardware requirements. - A. Gordon-Ross, S. Cotterell, F. Vahid. "Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example." Embedded systems commonly execute one program for their lifetime. Designing embedded system architectures with configurable components, such that those components can be tuned to that one program based on a program pre-analysis, can yield significant power and performance benefits. We illustrate such benefits by designing a loop cache specifically with tuning in mind. Our results show a 70% reduction in instruction memory access, for MIPS and 8051 processors - representing twice the reduction from a regular loop cache, translating to good power savings. - J.-H. Choi, J.-H. Lee, S.-W. Jeong, S.-D. Kim, C. Weems. "A Low Power TLB Structure for Embedded Systems." We present a new two-level TLB (translation look-aside buffer) architecture that integrates a 2-way banked filter TLB with a 2-way banked main TLB. The objective is to reduce power consumption in embedded processors by distributing the accesses to TLB entries across the banks in a balanced manner. First, an advanced filtering technique is devised to reduce access power by adopting a sub-bank structure. Second, a bank-associative structure is applied to each level of the TLB hierarchy. Simulation results show that the Energy*Delay product can be reduced by about 40.9% compared to a fully-associative TLB, 24.9% compared to a micro-TLB with 4+32 entries, and 12.18% compared to a micro-TLB with 16+32 entries.